Bibliographical Notes on Tree-structured classifiers
نویسنده
چکیده
The main source for the lecture on trees is Chapter 8, sections 1 to 4 inclusive, of the textbook [5]. CART is described in the lovely book by Breiman, Friedman, Olshen and Stone [1], which contains an extensive discussion of all the main points we touched during the lecture. A machine-learning point of view is contained in Chapter 3 of Mitchell’s book [8], which is a good reference for ID3 and C4.5. Devroye, Györfi, and Lugosi devote Chapters 20 and 21 of their book [4] to the topic. As usual, they devote a substantial effort to consistency proofs for classifiers, and provide us with interesting examples. Unlike the previous references, they also consider trees constructed using only the observations and not their labels. The material described in class came from two papers [3, 6]. The problem of matching impurity functions to loss functions was addressed by P.A. Chou in 1991 [3]. From this work we derived the analysis of the selection of optimal labels for nodes and leaves, and the theory of how to optimally choose splitting points. Chou’s paper also descries an algorithm that performs optimal (greedy) splitting in linear time (in the number of samples, in the number of classes and in the number of dimensions). The description of pruning methods provided during the lecture comes essentially from a paper by Esposito, et al.,[6]. Other pruning methods are also described in this work and in [7]. Two interesting interesting conclusions were reached in [6]: the use of a pruning set is often a bad idea, and some common methods exhibit a marked tendency to overprune/underprune, which one would not easily infer from their description.
منابع مشابه
Tree Kernel Usage in Naive Bayes Classifiers
We present a novel approach in machine learning by combining naı̈ve Bayes classifiers with tree kernels. Tree kernel methods produce promising results in machine learning tasks containing treestructured attribute values. These kernel methods are used to compare two tree-structured attribute values recursively. Up to now tree kernels are only used in kernel machines like Support Vector Machines o...
متن کاملMulti-View Forest: A New Ensemble Method based on Dempster-Shafer Evidence Theory
This paper proposes a new ensemble method that constructs an ensemble of tree-structured classifiers using multi-view learning. We are motivated by the fact that an ensemble can outperform its members providing that these classifiers are diverse and accurate. In order to construct diverse individual classifiers, we assume that the object to be classified is described by multiple feature sets (v...
متن کاملMulti-View Forests of Tree-Structured Radial Basis Function Networks Based on Dempster-Shafer Evidence Theory
An essential requirement to create an accurate classifier ensemble is the diversity among the individual base classifiers. In this paper, Multi-View Forests, a method to construct ensembles of tree-structured radial basis function (RBF) networks using multi-view learning is proposed. In Multi-view learning it is assumed that the patterns to be classified are described by multiple feature sets (...
متن کاملComparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images
Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...
متن کاملLecture Notes in Microeconomic Theory
Lecture notes in microeconomic theory : the economic agent / Ariel Rubinstein. p. cm. Includes bibliographical references and index.
متن کامل